Joint prosodic and segmental unit selection speech synthesis

نویسندگان

  • Robert A. J. Clark
  • Simon King
چکیده

We describe a unit selection technique for text-to-speech synthesis which jointly searches the space of possible diphone sequences and the space of possible prosodic unit sequences in order to produce synthetic speech with more natural prosody. We demonstrates that this search, although currently computationally expensive, can achieve improved intonation compared to a baseline in which only the space of possible diphone sequences is searched. We discuss ways in which the search could be made sufficiently efficient for use in a real-time system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint prosodic and segmental unit selection for expressive speech synthesis

One problem in concatenative speech synthesis is how to incorporate prosodic factors in the unit selection. Imposing a predicted prosodic contour as target specification is errorprone and does not benefit from the natural variability contained in the database. This paper introduces a method that searches for the optimal unit sequence by maximizing a joint likelihood at both segmental and prosod...

متن کامل

Towards intonation control in unit selection speech synthesis

We propose to control intonation in unit selection speech synthesis with a mixed CART-HMM intonation model. The Finite State Machine (FSM) formulation is suited to incorporate the intonation model in the unit selection framework because it allows for combination of models with different unit types and handling competing intonative variants. Subjective experiments have been carried out to compar...

متن کامل

Unit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases

This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...

متن کامل

F0 contour and segmental duration modeling using prosodic features

This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.

متن کامل

Including pitch accent optionality in unit selection text-to-speech synthesis

A significant variability in pitch accent placement is found when comparing the patterns of prosodic prominence realized by different English speakers reading the same sentences. In this paper we describe a simple approach to incorporate this variability to synthesize prosodic prominence in unit selection text-to-speech synthesis. The main motivation of our approach is that by taking into accou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006